LM Studio: Exploring AI Models from Your Desktop

Pantalla de portátil con interfaz de aplicación sobre escritorio ordenado

LM Studio is a desktop app (Mac, Windows, Linux) that downloads and runs local LLMs with a polished UI. No terminal, no complicated setup: open, pick model, chat. For exploratory developers, data analysts, journalists with sensitive data, and anyone wanting to try LLMs without sending queries to the cloud.

This article covers what it offers, when it’s better than Ollama or OpenWebUI, and where it has limits.

What LM Studio Does

Main features:

  • Model download from Hugging Face with one click.
  • Local execution over llama.cpp (under the hood).
  • Polished chat UI.
  • Local OpenAI-compatible API that other apps can consume.
  • RAG with your documents (PDF, TXT, DOCX) — chat with your files.
  • Saved prompt management.
  • Side-by-side model comparison.

All in a desktop binary, no terminal, no YAML config.

Installation

Download from lmstudio.ai. DMG for Mac, MSI for Windows, AppImage for Linux. Open.

First time asks to select a model. Recommended to start:

  • Mac Apple Silicon: Llama 3 8B Q4_K_M (~5GB) or Phi-3 Mini (3GB).
  • PC with 16GB RAM: Mistral 7B Q4 (~4GB) or Phi-3.
  • PC with 32GB+ RAM: Mixtral 8x7B Q4 (~25GB) or quantised Llama 3 70B (~40GB).

Download and load, ready to chat.

Usage Experience

For a non-technical user:

  • UI with model selector at start.
  • Chat with visual parameters (temperature, top_p, context length).
  • File upload for local RAG.
  • Export/import conversations.
  • Pre-configured prompt templates for common cases.

For a developer:

  • API server at localhost:1234 OpenAI-compatible.
  • Multiple models loaded simultaneously.
  • Logs of each query and tokens consumed.
  • GPU offloading configurable (CPU+GPU hybrid).

OpenAI-Compatible API

An underrated feature: LM Studio exposes an OpenAI-compatible API. Your existing code works:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="local-model",  # ignored, LM Studio uses loaded
    messages=[{"role": "user", "content": "Hi"}]
)

Useful for offline development, privacy-sensitive apps, or as fallback if OpenAI falls.

Local RAG with Your Documents

LM Studio integrates ingestion and RAG:

  1. Drag PDFs/docs to the chat.
  2. System extracts text, generates local embeddings.
  3. Chat uses relevant context from your docs.

For lawyers, doctors, journalists with confidential data: zero cloud exposure. Document store stays local.

Hardware and Performance

On Apple Silicon M2/M3:

  • Llama 3 8B Q4: 30-50 tokens/s on M2 Pro.
  • Mistral 7B Q4: similar.
  • Mixtral 8x7B Q4: 15-25 tokens/s on M3 Max 64GB.
  • Llama 3 70B Q4: 5-10 tokens/s if it fits unified memory.

On Windows with NVIDIA GPU:

  • RTX 4090: Llama 3 70B Q4 at ~15 tokens/s.
  • RTX 4070/4080: 7B-13B are sweet spot.
  • Laptop with 3050/4050: limited, better CPU inference.

CPU-only is viable for small models (3B) with slower but usable responses.

LM Studio vs Ollama

Honest comparison:

Aspect LM Studio Ollama
UI Rich desktop Minimal (CLI + optional web)
Installation DMG/MSI install CLI binary
Models Direct Hugging Face Own registry + GGUF
API OpenAI-compat OpenAI-compat
Built-in RAG Yes Via OpenWebUI
Multi-model loading Yes Yes
Linux AppImage (beta) Mature native
Target audience Non-tech users + devs Devs
License Closed (free) Open MIT

LM Studio wins for non-technical-user UX. Ollama wins for dev/CLI stack integration and open-source.

LM Studio vs OpenWebUI

OpenWebUI is a web UI for Ollama/other LLM backends.

Aspect LM Studio OpenWebUI + Ollama
Deploy Local desktop app Docker container
Multi-user No (single-user) Yes
UI quality Excellent Very good
Self-hosted Per user For team
Open-source No Yes

LM Studio is personal / single-user. OpenWebUI is team / multi-user self-hosted.

Real Use Cases

Where we see LM Studio:

  • Developers testing models before deploy.
  • Data scientists iterating with LLMs without cloud.
  • Journalists and lawyers with confidential documents.
  • Students learning about LLMs without spending on APIs.
  • Small companies with laptop fleets and strict compliance.

Where it doesn’t fit:

  • Production servers (use Ollama/vLLM).
  • Simultaneous multi-user (use OpenWebUI).
  • Scaling with multiple concurrent sessions.
  • Non-GUI environments (SSH-only servers).

Limitations

Honestly:

  • Closed-source (not OSS), though free. Potential lock-in.
  • Update cadence depends on LM Studio team.
  • Not easily integrable into CI pipelines.
  • Single-machine: doesn’t distribute inference.
  • Optional telemetry but worth verifying settings.

Performance Tuning

Three key tunings:

  • GPU layers: how many model layers go to GPU. More = fast but needs VRAM.
  • Context length: max tokens. Lower = faster + less memory.
  • Thread count: for CPU inference, match physical cores (not HT logical).

Play with these until finding your hardware’s speed/memory balance.

For Apple Silicon M2/M3:

  • General chat: Llama 3 8B Instruct Q4_K_M.
  • Code: DeepSeek Coder 6.7B Q4.
  • Spanish: Mixtral 8x7B if it fits.
  • Reasoning: Phi-3 Medium.

For modest hardware:

  • Phi-3 Mini (3.8B): excellent for size.
  • Gemma 2B: very light.
  • TinyLlama 1.1B: experimentation only.

Privacy and Data

LM Studio runs everything locally:

  • Models downloaded and stored on disk.
  • Chats stored in ~/.cache/lm-studio/.
  • RAG documents stay local.
  • Optional telemetry for analytics (check settings).
  • No mandatory cloud.

For sensitive data, it’s reasonable guarantee — nothing leaves your machine unless you enable it.

Conclusion

LM Studio is the best option for individuals wanting to explore local LLMs with polished UI. For teams, Ollama + OpenWebUI offers more flexibility. For production, neither — use vLLM or TGI. LM Studio occupies a specific but important niche: democratising local LLM access for non-technical users. Free and polished, it’s the obvious choice in its category. For people handling private data or wanting to experiment without paying for APIs, it’s worth downloading this afternoon.

Follow us on jacar.es for more on local LLMs, AI tools, and privacy.

Entradas relacionadas